Statistical distributions of quasi-optimal paths in the traveling salesman problem: 

the role of the initial distribution of cities. 
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Solutions to Traveling Salesman Problem have been obtained with several algorithms. However, 
few of them have discussed about the statistical distribution of lengths for the quasi-optimal path 
obtained. For a random set of cities such a distribution follows a rank 2 daisy model but our analysis 
on actual distribution of cities does not show the characteristic quadratic growth of this daisy model. 
The role played by the initial city distribution is explored in this work. 
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Las soluciones al Problema del Agente Viajero han sido obtenidas con varios algoritmos. Sin 
embargo, poco se ha discutido sobre la distribution de longitudes para el camino cuasi-optimal 
obtenido. Para un conjunto al azar de ciudades esta distribution sigue un modelo margarita de 
rango 2, pero nuestro analisis sobre distribuciones reales de ciudades no muestra el crecimientos 
cuadratico caracterfstico de este modelo margarita. En este trabajo se explora el rol jugado por la 
distribution initial de ciudades. 

Descriptores: Problema del Agente Viajero, distribution de ciudades, propiedades estadfsticas. 
PACS numbers: 89.75.-k,89.65.-s,02.50.-r,05.40.-a 



I. INTRODUCTION 

Statistical approaches to complex problems have been 
successful in a wide range of areas, from complex nuclei 
[l| to the statistical program for complex systems (see for 
instance Ref. Q), including the fruitful analogies with 
statistical mechanics. The main goal is to find univer- 
sal properties, i.e., properties that do not depend on the 
specifics of the system treated but on very few symme- 
try or general considerations. An example of such an 
approach is represented by the Random Matrix Theory 
(RMT) which has been applied successfully to wave 
systems in a range of typical lengths from one femtometre 
to one metre. 

Attempts to apply a RMT approach to a non- 
polynomial problem like the Euclidean Traveling Sales- 
man problem (TSP) is not the exception (fj For an 
ensemble of cities, randomly distributed, general statis- 
tical properties appear and they are well described for a 
daisy model of rank 20. However, in realistic situations 
the specific system seems to be important, as we shall see 
later, and the initial conditions rule the statistical prop- 
erties of the solutions. In the present work we deal with 
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a similar analysis both for actual TSP maps for several 
countries through the planet and for some toy models 
that could help to understand the role of initial condi- 
tions in the transition to universal statistical properties. 
An interesting link appears with the distribution of cor- 
porate vote when we map the lengths in the TSP problem 
to the number of votes, after a proper normalization. 

The paper is organized as follows: In section II we 
define the TSP and the statistical measures we shall use. 
Special attention will be paid to the separation of secular 
and fluctuation properties. Here, we analyze maps of 
actual countries. In section III we discuss the transition 
from a map defined on a rectangular grid to a randomly 
distributed one using two random perturbations to the 
city positions: i) a uniform random distribution with a 
width a and, ii) a Gaussian distribution. In the same 
section, we discuss the relation of the latter toy model 
to the distributions of corporate vote. The conclusions 
appear in section IV. 



II. STATISTICAL PROPERTIES OF 
QUASI-OPTIMAL LENGTHS 

In the Traveling Salesman Problem a seller visits N 
cities with given positions (xi, yi), returning to her or his 
city of origin. Each city is visited only once and the task 
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is to make the circuit as short as possible. This prob- 
lem pertains to the category of so called NP-complete 
problems, whose computational time for an exact solu- 
tion increases as an exponential function of N. It is, also, 
a minimization problem and it has the property that the 
objective function has many local minima 

Several algorithms exist in order to solve it and the 
development of much more efficient ones is matter of 
current research. Since we are interested in statistical 
properties of the quasi-optimal paths, small differences 
between the different algorithms are not of relevance and 
we shall consider all of them as equivalent. The computa- 
tional time is irrelevant as well. In this paper we used the 
results of Concorde M for the actual-country TSP, and 
simulated annealing [7j and 2-optimal @ for the analysis 
of specific models. 

The first step in the analysis consist in separating the 
fluctuating properties from the secular ones in the data. 
This process could be nontrivial [4j . The idea behind this 
procedure is that all the peculiarities of the system re- 
sides in the secular part and the information carried out 
by the fluctuations have an universal character. In the 
energy spectrum of many quantum systems, when almost 
all the symmetries are broken, this kind of analysis has 
shown that the fluctuations are universal and regarding 
only to the existence or not of a global symmetry like 
time reversal invariance. The peculiarities of the system, 
wheather it is a many particle nucleus, an atom in the 
presence of a strong electromagnetic field, or a billiard 
with a chaotic classical dynamics, all these characteris- 
tics are in the secular part Q. In the present case, we 
assume that the dynamics that rules where the cities are 
located is sufficiently complex in order to admit this kind 
of analysis. If not, as we shall see later, the next step in 
our analysis is to search for the reason of such a lack of 
universality. 

In order to perform this separation we consider the 
density of cumulative lengths, d, named 

k 

with <5(-), the Dirac's delta function. The cumulative 
lengths dk are ordered as they appear in the quasi op- 
timal path and are defined as dk — z^i=i n with U — 
\J (xi — Xi^i) 2 + (iji — j/i-i) 2 being the length between 
city i — 1 and i. The cities are located at (xi, y.i) in the 
X — Y plane. The corresponding cumulative density is 

N(d) = ®{d - dk), 

k 

with O(-), the Heaviside function. The task is to sepa- 
rate N(d) as Afsecuiar{d) + N fluctuations (d) ■ The secular 
part is calculated using a polynomial fitting of degree n. 
After this, we consider as the variable to analyze the one 
mapped as 

£k = N.Secular(dk)- 
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N cities 


Imax 
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Burma 


33708 


8650. 


.00 


197 


.00 


2 


Canada 


4663 


76133. 


33 


10263 


.28 


3 


China 


71009 


51646. 


00 


4225. 


.62 


4 


Egypt 


7146 


6708. 


.34 


431 


.90 


5 


Ireland 


8246 


4059. 


.17 


973 


.62 


6 


Finland 


10639 


10200. 


01 


536 


.04 


7 


Greece 


9882 


6600. 


.02 


286. 


.97 


8 


Italy 


16862 


10050. 


00 


438 


.46 


9 


Japan 


9847 


16166. 


68 


581 


.44 


10 


Kazakhstan 


9976 


38925. 


00 


5559. 


.56 


11 


Morocco 


14185 


8056. 


.41 


418 


.08 


12 


Oman 


1979 


3685. 


.56 


541 


.17 


13 


Sweden 


24978 


9250. 


.02 


368 


.03 


14 


Tanzania 


6117 


10166. 


68 


1145 


.41 


15 


Vietnam 


22775 


4983. 


.36 


145 


.26 


10 


Yemen 


7663 


10810. 


28 


877 


.26 



TABLE I: Countries considered for the study. The quasi- 
optimal path was obtained from the web page in Ref. 
Maximum and average length (in km) are reported too. 

We shall study the distributions of the set of numbers 
Notice that this transformation makes that (£) = 1. 
The analysis is performed on windows of different size, 
this kind of analysis is always of local character. For his- 
torical reasons this spreading procedure is named unfold- 
ing. From all the statistics we shall focus on the nearest 
neighbor distribution, P(s), with Si = ^ — (which is 
the normalized length), and the number variance, S 2 (i), 
for the short and large range correlations, respectively. 
The £ 2 (i) is the variance in the number of levels £ n in 
a box of size L, see [!, 0] for a larger explanation. 

The actual maps considered in this work were those 
which are reported in Concorde's web page [H, and we 
selected those that have a number of cities larger than 
1000 and present no duplications. The countries selected 
are reported in table HI 

The results in the Af(l) are full of variations, as ex- 
pected. The secular part was calculated with several 
windows size and polynomial degrees looking for those 
parameter values that stabilize the statistics. However, 
a no universal behavior appears. The graphs presented 
in Fig. Q] for the nearest neighbor distribution were cal- 
culated using a polynomial fitting of fourth degree and 
windows of 200 lengths, the histograms have a bin of size 
0.004. There we show the distribution for all the coun- 
tries in table [I] There is no a single distribution even 
when some of the countries show a exponential decay as 
occurs with Finland (see Fig. [2]). We analyzed the data 
with several windows size and polynomial degrees with 
similar results. 

No regularities were found in our analysis but all the 
histograms show almost a maximum and some of them 
present a polynomial grow, s Q , at the beginning of the 
distribution. This could be seen in Burma (1), Japan (2) 
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FIG. 1: (color online) Nearest neighbor distribution of nor- 
malized lengths (histogram), for the quasi-optimal path in the 
countries referred in Table [T] For sake of clarity, we shifted 
the distributions. The bin size is 0.004 and notice that the 
normalized average is s = 1. 



and Sweden (13). Notice that the distributions present 
the maximum at s < 0.2 meanwhile the average is at 
s = 1. Hence, all the distributions present a long tail. 
However the type of decay is diverse as well (see Fig. 
[2]). Some distributions present a clear exponential decay, 
like Finland. Some others present a mixed decay as is 
the case of China. For sake of clarity we do not show all 
the cases in Fig. [2l Notice that the bin size is larger in 
Fig. compared to that used in Fig. [TJ for this reason, 
the polynomial grow does not appear. 

We try to show countries of several sizes and urban 
configurations. There are some countries with an urban 
density almost constant, like Sweden, and some others 
with long tails and a maximum for very short distances 
as Canada. Changes in the parameters of analysis, win- 
dows size and fitting polynomial degree, do not give us 
an universal behavior. Recall that this is not the case 
for an ensemble of randomly distributed cities as seen 
in Ref . 0, Q . From all these results it is clear that the 
initial distribution of cities plays a crucial role in the fi- 
nal quasi-optimal result. This is the subject of the next 
section. 



III. THE ROLE OF THE INITIAL 
DISTRIBUTION OF CITIES 

The statistical properties of quasi-optimal paths for an 
ensemble of randomly distributed cities in the Euclidean 
plane is well described by the so called daisy model of 
rank r = 2 |. These models are the result of retaining 
each r + 1 number from a sequence of random numbers 
Ui which follow a Poisson distribution, i.e, its n-nearest 
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FIG. 2: (color online) Nearest neighbor distribution of nor- 
malized lengths in semi- log scale, for the quasi-optimal path 
in the countries referred in the inset. The bin size is 0.04. 
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FIG. 3: Examples of the quasi-optimal path on one map 
of the model I, with values of (a)<7 = 1.0 and (b)cr = 15.0. 
For model II, the considered values are (c) a — 5.0 and (d) 
a = 10.0. The maps consist of 27 x 27 cities. 



neighbor distribution has the form 

P(n,s) = — exp(-s), 

{n-iy. 

with Si = yi+ n — Tji and n — 1 corresponds to first nearest 
neighbors. The rarefied sequence must be re-scaled in 
order to recover the norm and the proper average of the 
n-nearest neighbor distributions. 

For the general daisy model of rank r we have the 
following expression for the nearest neighbor distribution: 

r r + i)( r + 1 ) 

Pr{s) = rtr + l) + X »' W 
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width a, Via). That is 



FIG. 4: (color online) (a) Nearest neighbor distribution, P(s), 
of normalized lengths for model I for the value of the a indi- 
cated in the key. The functions in continuous line and labeled 
with r correspond to the distributions of the daisy model of 
equation [1] (b) Same as (a) but in semi- log scale. Note that 
the histogram for a = 6/2 = 5.0 is well fitted by the daisy 
model with r = 6. (See text for details) 



and, 



L 



"(r + 2) 



!)2 l2 



+ 1 6(r + l) 2 (r + l) 2 4^(1- Wj) 
xexp[(Wj-l)(r + l)L)] 



(2) 

for the number variance. Here Wj = exp(27r«j/(r + 1)) 
are the r + 1 roots of unity and i stands for y/—l. 

In the case of r = 2, both, the nearest neighbor distri- 
bution and the S| statistics have the theoretical results, 
namely 



■PaOO = ^-s 2 exp(-3s), 



and 



L 4 
3 + 27 



1 f 3 A f 9 n 
1 - cos(— )exp(--i) 



(3) 



(4) 



As mentioned above, quasi-optimal paths of an ensemble 
of maps with cities randomly distributed nearly follow 
equations ^ and Q. Again, the final distribution of 
lengths in the quasi-optimal paths depends on the initial 
distribution of cities. 

In order to understand the role of initial distribution 
of cities, we depart from a master map of cities on a 
square grid of side 6, where the initial position of each 
city is in the intersections. In this case, the distribution 
of lengths for the quasi-optimal path is close to a delta 
function with a small tail. Now, we take an ensemble 
of maps, each one is built up relocating the cities from 
their original positions using a probability distribution of 



Xi = nb + Pi 
yi = mb + qi, 



(5) 
(6) 



where the numbers pi and qi are taken from V{a) which 
have zero mean and n and m are integers. Two distribu- 
tions were selected, i) a uniform one with width a, that 
we call model I, and ii) a Gaussian one with the same 
width, that we named model II. In Figure [3] several ex- 
amples are shown of the quasi-optimal paths. The start- 
ing grid or master rectangle defining the 27 x 27 cities is 
similar to that of Figure G^a). 

The distribution of lengths shows a transition from the 
original one to the limit case given by equations ^ and 
(HI as can be seen in Fig. 2] for a uniform distribution 
and in Fig. [5] for the Gaussian one. In these figures we 
plot P(s) in both (a) linear and (b) logarithmic scale. 
The analysis was performed on an ensemble of 500 maps 
of 27 x 27 cities each. The variable a was considered in 
the interval [0, 15]. Only relevant values are reported. 

For model I, we plotted the histograms for cr = 1, 4.5, 5 
and 15.0 (see Fig@]). For the first case the distribu- 
tion departs barely from the initial one but, the quasi 
optimal solution presents a revival for s slightly below 
v2(Fig. Ufa) the black histo gram with circles), and for 
values slightly larger than 2 representing the existence 
of diagonal lengths in the grid and lengths of order two 
6's (in this uncorrelated variable, see Fig. |3][a)). The 
histograms show a continuous transition to the distribu- 
tion given by Eq. ([3]) when a — 15 (blue histogram with 
stars). The tail, in this case, follows very closely the 
daisy model of rank 2 to it and the start of P(s) is con- 
sistent with it (see Fig. EJb)). The histogram in red with 
crosses corresponds to a = 4.5 and follows very closely 
the daisy model of rank 7. Meanwhile, the histogram in 
green with diamonds, corresponding to a = 5.0, follows 
the rank 6 model. The second case corresponds to the 
value of er when the distribution of the cities start to ad- 
mit overlapping, i.e a = 6/2. When a = b the histogram 
coincides with the daisy model of rank 3 (not shown) . 

For model II there exists a transition and the fitting 
to a daisy model is, as well, defined as in model I. In 
Fig. [5] we plot the cases a = 5.0 and 10.0 which are 
close to daisy models of rank 3 and 2, respectively. In 
Fig. [5jb) we re-plotted in semi-log scale in order to see 
the tail decay. The interpretation is the same as the 
previous one. Fluctuations in the tails for large values 
of a are observed in the Gaussian case. The reason for 
them is that the map admits cities positioned far away 
from the master rectangle (not shown). Certainly, a fit 
using Weibull or Brody distributions is possible for both 
models, however it does not exist a link with any physical 
model whereas the daisy models are related to the 1- 
dimensional Coulomb problem Q. 

An interesting link exists in this context between the 
distribution of lengths in the quasi-optimal path and the 
vote distribution trough Daisy models. In Rcf. [l(j] it 
has been established that the distribution of corporate 
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FIG. 5: (color online) In the same order as previous figure, 
but for model II and the values of a and r indicated. Notice 
that this time the value a = 6/2 = 5.0 is fitted by a daisy 
model or rank r = 3. 



vote in Mexico, during elections from 2000 to 2006 fol- 
lows a daisy model with ranks from r = 2 to r = 7. In 
the present case, exponential decays that are compatible 
with r = 3 occur at a = b for model I and a = 6/2 for 
model II, i.e., a = 10.0 for the former and a = 5.0 for 
the latter. In both cases, distributions of the random 
perturbation overlap is significant. Other links look pos- 
sible and are under consideration. This behavior remains 
poorly understood, but it opens new questions about the 
relationship between statistical mechanics problems and 
social behavior. 

In the case of long range behavior, the analysis with the 
E 2 statistics looks promising, but the asymptotic behav- 
ior does not coincide with daisy models. This statistics 
is highly sensitive to the unfolding procedure described 
before. Wider studies are currently in progress, however 
we give in advance that the £ 2 statistics for model I at 
(j = 15.0 is close to that of the daisy model of rank 2. 
The slope is 0.317486 ± 7.31 x 10~ 5 which is close to the 
1/3 value obtained for the daisy model. For both mod- 
els the S 2 starts following the behavior of Eq. (|4|). For 
small values of a the numerical results show an oscilla- 
tory behavior compatible with the presented in the daisy 
model, Eq. @, even when the asymptotic slope is not 



the correct one. 



IV. CONCLUSIONS 

We presented a statistical approach to the Traveling 
Salesman problem (TSP). No universal behavior appears 
in the case of actual distribution of cities for several coun- 
tries world wide as it appears in the case of the Euclidean 
TSP with a uniform random distribution of cities 0, 0] ■ 
As a first step to understand the role of the initial dis- 
tributions of cities, we study the nearest neighbor distri- 
bution for the lengths of quasi-optimal paths for a model 
which start with a periodic distribution of cities on a grid 
and it is perturbed by a random fluctuation of width a. 
We use two models for the fluctuation: model I, a uniform 
distribution and, model II, a Gaussian one. Both models 
evolve, as a function of a, from a delta like initial distri- 
bution to one well described by a daisy model of rank 2 
(see Eq. ([3])). As the pcrturbativc distribution width is 
increased the evolution of the models present a nearest 
neighbor distributions compatible with several ranks of 
the daisy model. Two values of the width are important, 
the first one is when the random perturbation admits an 
overlap of the cities originally at the periodic sites. For 
model I that occurs when a = b (the total width of the 
distribution is 2a), being b the distance of the periodic 
lattice. For model II that happens when b = 2er, i.e. 
two standard deviations of the Gaussian distribution. In 
these cases the histogram of lengths fits a rank 3 daisy 
model. An interesting link appears when we notice that 
such a daisy model fits the tail of the distribution of votes 
(for the chambers) for a corporate party in Mexico during 
election of 2006. The reason of this coincidence remains 
open and requires further analysis. An attempt in this 
direction is presented in Ref. . Another open question 
concerns about if an ensemble of world wide countries 
have universal properties or not. This topic is in current 
research. 
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